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We present an efficient implementation for the evaluation of Wigner 3j, 6j, and 9j symbols. These 
represent numerical transformation coefficients that are used in the quantum theory of angular 
momentum. They can be expressed as sums and square roots of ratios of integers. The integers can 
be very large due to factorials. We avoid numerical precision loss due to cancellation through the 
use of multi-word integer arithmetic for exact accumulation of all sums. A fixed relative accuracy is 
maintained as the limited number of floating-point operations in the final step only incur rounding 
errors in the least significant bits. Time spent to evaluate large multi-word integers is in turn reduced 
by using explicit prime factorisation of the ingoing factorials, thereby improving execution speed. 
Comparison with existing routines shows the efficiency of our approach and we therefore provide a 
computer code based on this work. 


I. INTRODUCTION 

Wigner 3j, 6j, and 9j symbols {xj symbols) are used 
in physics and chemistry whenever one deals with an¬ 
gular momenta in quantum mechanical systems [SHH]. 
These functions correspond to transformation coefficients 
between different spin representations. The input argu¬ 
ments are six or nine integers, or half integers, that rep¬ 
resent different angular momenta j and projection quan¬ 
tum numbers \m\ < j. Because of their frequent appear¬ 
ance in scientific computing, standard library functions 
that evaluate Wigner symbols are available in several 
mathematics software systems [11113112 and computa¬ 
tional libraries [9]. In fact, the quantum spin properties 
of electrons, protons, neutrons and atoms imply that an¬ 
gular momentum recoupling is a general and important 
ingredient in the theoretical modeling of quantum many- 
body systems |8]. Well known examples are found in 
quantum chemistry d , condensed matter |5] , atomic m, 
and nuclear physics |16j . More recently, the mathemat¬ 
ical apparatus of quantum-mechanical angular momen¬ 
tum coupling has found a prominent role also in quantum 
gravity and quantum computing applications [5]- 

In particular, the Wigner symbols enter at a criti¬ 
cal stage in the computational chain of quantum many- 
body modeling; namely when calculating the strength 
of the interaction between pairs or triples of particles. 
The evaluation of these so called interaction matrix ele¬ 
ments can involve intermediate angular-momentum sums 
that extend to very large values {j ~ 100). Eventu¬ 
ally, a sum over all possible particle pairs or triples 
within the many-body system gives the total interac¬ 
tion energy. The continued efforts to improve both ac¬ 
curate calculations dUillll], constructing recursive re¬ 
lations iniiiiiiH], and achieving fast look-up m show 
the central place of these symbols in computations. 

Accurate evaluation of Wigner symbols with large j 
using floating-point arithmetic is difficult because of can¬ 
cellation in sums of large alternating terms. The preci¬ 


sion losses due to cancellations can be avoided by using 
integer arithmetic. However, this approach generally in¬ 
volves very large numbers due to factorials, thus requiring 
the use of multi-word integer representations in computer 
codes. Multi-word integers are multiple machine words 
used together to represent arbitrarily large numbers. A 
clever reduction of the huge integers, by expressing the 
terms as products of binomial coefficients, was demon¬ 
strated by L. Wei jH]. In this context, one can also men¬ 
tion early efforts introducing prime factorisation mm 
However, high-accuracy approaches are in general signifi¬ 
cantly slower than straight-forward floating-point imple¬ 
mentations. As an example, the computation of 6j sym¬ 
bols for small angular momenta, max(j) = 3, is already 
an order of magnitude slower when using the published 
code of Ref. [21] as compared to a floating-point calcula¬ 
tion with the popular GNU Scientific Library (GSL) [3|. 
This difference is growing with increasing j. 

In this paper we present a new method in which the 
big-integer penalties are mitigated by making explicit use 
of prime-factorised factorials. In our tests this method 
executes considerably faster than previous high-accuracy 
approaches, provides a bounded and fixed relative accu¬ 
racy, and extends to very large values of ingoing angular 
momenta. 


II. METHOD 

The implementation uses the fact that the xj symbols 
can be written on the form 


where n, s and q are integers. This general expression 
comes directly from the defining equations |19j . E.g., for 
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6 j symbols we have the Racah formula 
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with 

^ ^ (a + 6 — c)!(a — 6 + c )!(6 + c — a)! 
(a + 6 + c + 1 )! 


Oil — a “t“ 6 “t" c, I3i — a b d 

02 = d + e + c, (32 = a + c + d + f, 

03 = a + e + f, ^3 = 6 + c + e + /, 

04. = d b f. 

(4) 

The summation index k is in the range 
max(ai, q; 2 , 0 : 3 , a 4 ) < k < mui{l3i, ^21 Ps)- As stated 
before, the input arguments a, b, c, d, e, and / are integers 
or half-integers. For a symbol not to be trivially zero, 
the sums Oi must be integer and Oi must fulfill triangle 
conditions, e.g. for di: 


and 

di = {a,b,c), 
02 = {d,e,c), 
as = {a,e,f), 
a4 = {d, b, /), 


|a — 6 | < c < a + b. 


(5) 


The overall structure for symbols is similar. Note 
how all expressions contain (factorials of) integers. While 
being possibly large, they can be evaluated exactly. By 
finding and using the least common denominator (LCD) 
of the sum, expression ( 1 ^ can be brought to the form 

0 - 

Finding the LCD is straightforward by expressing each 
factorial as an explicit product of prime numbers, e.g. 
7! = 2'* X 3^ X 5^ X 7^, treated as the tuple (4,2,1,1) 
in the calculations. The tuple representing each term 
is found by subtracting the exponents related to each 
prime number in all factors of the denominator from the 
ones representing the numerator. The LCD is then found 
by taking the minimum value for each exponent over all 
terms. In practice, the LCD is reduced considerably since 
the numerator is also included in this procedure. Factors 
common to all terms are then represented by positive 
exponents. As an explicit example we consider the com¬ 
putation of the 6j symbol { 222 }- 4"^® sum that needs 
to be evaluated for this symbol includes three terms, see 
Table |l] The final common factor of the sum (which in¬ 
cludes the LCD) is found to have the tuple representation 
(1, 2,1,1). The positive exponents are a consequence of 
the relatively large numerators appearing in the sum. 

The common factor is then subtracted from the ex¬ 
ponent tuples of each term, giving pure numerators. In 
Table |l] this step corresponds to the second to last col¬ 
umn. They are converted into exact integers using multi¬ 
word arithmetics. The sum is then easily accumulated, 
yielding one part of n in Eq. Q. 


The product of the A-factors is formed by summing the 
factorial prime exponent contributions. As it shall suf¬ 
fer a square root, one factor will be removed from each 
odd exponent to make sure that all remaining ones are 
even. The removed odd ones will build the term ^/s in 
Eq. Q , while the remaining even ones will be divided by 
two, thereby evaluating the square root. In our example, 
the product under the square root becomes 
that is represented by the tuple (— 1 , — 2 , —I, — 1 )^ = 
(—4, — 8 , —4, —4). All exponents are even so applying the 
square root simply gives (—2, —4, —2, —2)-\/T. 

The common factor is then added, giving final expo¬ 
nents that are used to create q and the second part of n in 
Eq. 0, by using the negative and positive exponents, re¬ 
spectively. In our example, the product of the A-factors 
is added to the common factor of Table |l] giving the final 
exponents (—1, —2, —1, —I), which corresponds to 1/630. 
The two parts of n are combined using multi-word integer 
multiplication. 

In the final step, the value of Wx in Q can be evalu¬ 
ated using floating-point arithmetics with a limited loss 
in relative precision. In our example, this corresponds to 
the evaluation of —27-\/T/630 = —0.0429. The three con¬ 
versions from integer to floating point, together with one 
square root, one multiplication, and one division consti¬ 
tute six operations that each incur the loss of (at most) 
half a least significant bit in the floating-point represen¬ 
tation used, commonly referred to as the machine ep¬ 
silon, e. For the 64-bit floating-point format of IEEE 
758, which is usually used for the C type double, one has 
e = 1.11 • 10“^®. This implies a maximum relative error 
of 6 e = 6.66 • 10 -^®. 


A. Wigner 9j 

Wigner 9j symbols can be written as a sum of prod¬ 
ucts of 6 j symbols, see e.g. Ref. [19]. It is not obvious 
from this direct formula for Wigner 9j symbols that they 
can be written on the form Q due to the square roots in 
the A prefactors of the 6 j symbols. Nevertheless, a re¬ 
arranged formula based on binomials was presented [ 20 ] 
and used [H] by L. Wei. This formula is also employed in 
the present implementation, with the small modification 
of expressing the binomials as factorials. In total, the 9j 
evaluation becomes a double sum where an overall LCD 
expressed on prime-exponent form can be extracted. The 
routine implementing the sum in the 6 j formula is re-used 
for this purpose. 


III. RESULTS 


Calculations using the wigxjpf routine presented in 
this work have been compared (in floating point) with the 
369j code by Wei [2T] and with a simple implementation 
in which we use floating-point operations from precal¬ 
culated factorial tables (see Sec. IV). The floating-point 
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TABLE I. Evaluation of the 6 j symbol {2 2 2; 2 2 2} that illustrates the identification of the common factor (including the LCD) 
and the accumulation of the numerator sum. The fifth column corresponds to the tuple representation of each term, i.e. the 
numerator tuple subtracted by the denominator product tuple. See text for details. 


Term 

Sign Numerator Denominator 

Total ratio 
(prime tuple) 

Final numerator 
(prime tuple) (integer) 

A: = 6 

-hi 

7! 

2!2!2! 

(1,2,1,1) 

(0,0,0,0) 

1 

k = 7 

-1 

8! 

1 

(7, 2,1,1) 

(6,0,0,0) 

-64 

fc = 8 

-hi 

9! 

2!2!2!2! 

(3,4,1,1) 

(2, 2,0,0) 

36 



Gommon factor / LCD: 

(1,2,1,1) 

Sum: 

-27 
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FIG. 1. Upper panel: average evaluation time for 3j symbols using four different implementations: wigxjpf (this work, 
circle markers), 369j (crosses), GSL-1.16 (diamonds), and a simple floating-point computation (plus markers). The 
solid lines correspond to the exhaustive measurements where all non-trivially-zero symbols with a specific maximum j have 
been computed. The dotted lines represent an approximation to the average evaluation time obtained from computations 
of a random subset. The overhead time to enumerate symbols, etc., has been subtracted. Lower panel: maximum relative 
error, as compared to the symbols evaluated using long double on x86 hardware. All tests have been performed using a Xeon 
E3-1240v3, single-threaded @ 3.8 GHz. The dashed line indicates the expected maximum relative error of our implementation, 
which is six times the machine epsilon. 
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based routines of GSL (version 1.16) |1] have also been in¬ 
cluded in the measurements. The comparisons are both 
in terms of execution speed and accuracy. The bench¬ 
mark calculations for measuring the accuracy are per¬ 
formed with our routine, wigxjpf, using extended pre¬ 
cision computations (the 80 bit long double of the x87 
floating-point unit, e = 5.42 • 10“^°). This benchmark 
is reliable since only the last floating-point stage of the 
current routine introduces rounding errors. 

Results are presented in Fig. [l] and Fig. for 3j and 
(6j, 9j) symbols, respectively. For reasonable maximum 
j, all symbols not trivially zero by triangle conditions 
have been calculated (solid lines), which provides an ex¬ 
haustive measurement of the average evaluation speed 
and the maximum relative error. However, for larger 
symbols the measurement is approximative as we have 
restricted the computations to a random subset of non- 
trivially-zero symbols (dotted lines with markers). 

In all measurements, the present work is slightly more 
accurate and up to an order of magnitude faster than 
369j. The relative error never exceeded 6e « 6.66-10“^®, 
which is our upper bound. The slightly larger accuracy 
loss in 369j is due to individual conversion of each prime 
power to floating point and subsequent multiplication. 
The execution time difference between our implementa¬ 
tion and the fast floating-point based routines is less than 
an order of magnitude at low j, showing that the wigxjpf 
routines presented in this work are also very efficient. In 
comparison to GSL (version 1.16), wigxjpf is actually 
faster for 3j symbols up to max(j) = 60, while the dif¬ 
ference in execution speed always remains smaller than a 
factor 4 up to max(j) = 20 (30) for 6j {9j) symbols. At 
large j the floating-point routines obviously suffer from 
an increasing loss of accuracy, thus making wigxjpf su¬ 
perior. 

Explicit numerical values, execution times, and mem¬ 
ory usage for a selected set of symbols are presented in 
Table [n] Note that we present results for extreme cases 
such as j = 50, 000 and j = 2, 000 for 6j and 9j symbols, 
respectively. 


IV. IMPLEMENTATION 


Before the calculation of individual symbols, a table 
with precalculated factorisations of factorials is prepared. 


Table III shows the maximum possible factorial that has 
to be computed for evaluation of different symbols. The 
time spent on this initialization stage is amortized for 
applications that make repeated use of the routines. The 
table also gives the maximum number of terms in the 
sum (inner sum for 9j symbols), for each of which tem¬ 
porary results (numerator-denominator exponent tuples) 
are stored during execution. 

First, all prime numbers up to the largest factorial ar¬ 
gument, p, are determined by the sieve of Eratosthenes. 
By simple enumeration of integer exponent combinations 
in the basis of prime numbers, the factorisation of all in¬ 


tegers up to p are determined. Thus the factorisations are 
performed without any ’test’ divisions, i.e., checks for re¬ 
mainder zero. Each factorisation is stored as an array of 
integers. The factorisation of factorials are constructed 
by cumulating the previous factorisations. 

Multiplying (or dividing) two such factorisations is a 
matter of adding (or subtracting) each element of the 
arrays, giving a new array. Adding or subtracting two 
factorisations would be more complicated as the result, 
expressed as a factorisation, could be completely differ¬ 
ent, even requiring much larger prime factors. To avoid 
this, each term is converted from the prime-factor ar¬ 
ray representation to multi-word integer before addition 
or subtraction. No attempt is made to extract common 
factors between different multi-word integers, as this is 
expensive and would have no impact on accuracy. 

The multi-word integers are just multiple machine 
words (32 or 64 bits) used to represent arbitrarily large 
numbers. The highest word is treated as having a sign 
bit, while all others are unsigned. Routines that per¬ 
form addition, subtraction and multiplication are imple¬ 
mented. The main difficulty is to propagate overflow (i.e. 
carry) between the words. The execution times for ad¬ 
dition and subtraction are linear in the number of words 
used, while multiplication is quadratic. 

The conversion of a number in prime-factor form to 
multi-word integer form evaluates the contribution for 
each prime number separately, before multiplying them 
together. We utilize the fact that, for each exponent, each 
successively higher bit in its binary number representa¬ 
tion corresponds to the square of the value represented 
by the previous bit. Therefore, two values are kept while 
iterating through the bits of each exponent: the repeat¬ 
edly squared prime number and a cumulative product. 
While non-zero bits remain, the first number is squared 
each iteration, starting as the prime number for the first 
bit. For each bit set to one, the cumulative product is 
multiplied by the hitherto squared result. These inter¬ 
mediate results are treated as multi-word integers when 
necessary. 

Conversion from multi-word integer to floating point 
in the final stages is done word by word, from lower to 
higher words. The contributions that cannot be repre¬ 
sented (i.e. that are below the precision) in the resulting 
floating-point variable are therefore just lost/ignored. 

Repeated similar calculations of factorial arguments 
can be avoided by noting that the factors (and denomi¬ 
nators) of each successive term in the sum depend on the 
summation index k in such a way that they always change 
their argument by 1 for each iteration. Therefore, facto¬ 
rial arguments (table addresses) can be calculated before 
the sum loop. In the actual loop these addresses are then 
just moved up or down one step each iteration. 

Symbols that are trivially zero can be identified using 
a single if-clause in combination with two’s-complement 
arithmetics and bit manipulation. Only symbols passing 
this test are evaluated. 

For the comparisons presented in Sec. IB we have 
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FIG. 2. 


Same as Fig. [^ but for the evaluation of 6j (left panels) and 9j symbols (right panels). 


TABLE II. Values and execution times for selected xj symbols. These times include setup of the precalculated tables of factorised 
factorials. The memory used for those tables and other temporary storage is also given. All tests have been performed using a 
Xeon E3-1240v3, single-threaded @ 3.8 GHz. 



Symbol 

Value 

Time 

Memory 

3j 

f 15 30 40 \ 

1 2 2 -4 j 

-0.01908157979919155 

0.07 ms 

58 kB 


200 200 200 
-10 60 -50 
50,000 50,000 50,000 
1,000 -6,000 5,000 


0.0007493927313989515 0.31 ms 730 kB 


-1.116843916927519e-05 112 s 19 GB 


6j 


[8 8 8] 
[ 8 8 8 J 

[ all 200 } 

[ 

-0.01265208072315355 

0.0001559032124132416 

0.06 ms 

0.62 ms 

7.0 kB 

1.0 MB 


{ all 600 } 

-1.03981778344144e-07 

6.1 ms 

8.0 MB 


{ all 10,000 } 

2.770313640470537e-08 

8.2 s 

1.5 GB 


{ all 50,000 } 

3.997351841910046e-08 

816 s 

32 GB 


9j 


8.5 

9.5 7.0 ' 

1 



12.5 

8.0 8.5 

^ 0.0002812983019125448 

0.08 ms 

20 kB 

8.0 

10.5 9.5 ^ 

1 



100 

80 50 1 




50 

100 70 

i 1.055977980657612e-07 

1.9 ms 

0.50 MB 

60 

50 100 J 




{ all 200 } 

1.278335300545066e-07 

0.15 s 

1.6 MB 

{ all 1,000 } 

1.749851385596156e-09 

29.8 s 

30 MB 

{ all 2,000 } 

2.755181565857189e-10 

358 s 

109 MB 
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TABLE III. Maximum factorial argument and number of 
terms given the maximum j in a symbol. The limits can 
be rounded down when non-integer. These limits must be 
considered for temporary memory allocation. The memory 
requirement for the precalculated factorial tables are given in 
the last columns for three different values of max(j). 


Symbol 

Maximum 

p\ terms 

Memory requirement 
j = 100 j = 1,000 j = 10,000 

3j 

(3i + 1)! j + 1 

193 kB 10.8 MB 0.78 GB 

6j 

(4j + l)! j + l 

309 kB 17.9 MB 1.35 GB 

9j 

{5j + l)\ j + l 

450 kB 27.5 MB 2.06 GB 


also implemented simple floating-point routines. They 
use precalculated floating-point factorial values employ¬ 
ing the same list-address technique for the sum as de¬ 
scribed above and thus become very fast. The factorials, 
stored as floating-point values, are calculated precisely 
using the multi-word integers. The accuracy loss is com¬ 
pletely dominated by cancellation in the sums of large 
alternating terms. 


A. Memory usage 


The largest memory requirement of wigxjpf is asso¬ 
ciated with the tables of precalculated factorisations of 
numbers and factorials, see Table III The next largest 
use is the temporary arrays of prime factors of each term 
in the sum. Other variables that are used during compu¬ 
tations, e.g. multi-word integers, are comparatively small 
in size. 


V. CONCLUSIONS AND OUTLOOK 

In conclusion, we have successfully implemented a 
computational method for fast and accurate evaluation of 


Wigner 3j, 6j, and 9j symbols using prime factorisation 
and multi-word integer arithmetics. The efficiency and 
accuracy of our approach has been validated via bench¬ 
mark calculations and comparison with existing floating¬ 
point routines and with the 369j code by Wei [H]. The 
latter algorithm is similar to our approach in the sense 
that it also uses prime-number factorisation and multi¬ 
word integer arithmetics. However, it employs recursive 
calculation of binomial coefficients while our implemen¬ 
tation, with the precalculation of a table with prime- 
number factorisations, offers a code that is significantly 
faster; comparable in speed even to the floating-point 
routines. In addition, the precomputed factorisation ta¬ 
ble makes our approach particularly amenable to applica¬ 
tions in which multiple evaluations of angular momentum 
coupling coefficients are needed. 

Furthermore, our implementation produces results 
with a maximum relative error of only six times the ma¬ 
chine epsilon for 3j, 6j, and 9j symbols, also for very 
large angular momenta. In comparison, the error in the 
evaluation of 6j symbols using GSL 1.16 is five orders of 
magnitude larger already for j = 15 and quickly grows 
with increasing values of the angular momenta. 

A computer code with the routines presented in this 
work, wigxjpf, is available for download [1]. 
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